Technical report on Automatic Identification of Paedophile
نویسندگان
چکیده
Accurate and up-to-date knowledge of keywords entered by users who search or provide paedophile content is a key resource for filtering purposes and for monitoring by law enforcement institutions. However, such keywords are often hidden and may change frequently, and our current knowledge about them relies on manual inspection and field expertise. We explore here the possibility to help in improving this situation by applying various keyword analysis methods. Using a large-scale real-world collection of paedophile and non-paedophile file names, we construct lists of keywords suspected to be used as paedophile keywords. We evaluate the relevance and interest of these lists by submitting them to experts, thus showing that automatic approaches are indeed of great interest for this task.
منابع مشابه
Technical report on the Automatic Detection of Paedophile Queries
Filtering or identifying paedophile queries is a key issue for law enforcement and search engines. However, these queries are in general mixed with a huge amount of other queries. Moreover, little is known on their characteristics. We address here these two issues in order to design the first tool for automatic detection of paedophile queries. Using domain expertise, we select some paedophile q...
متن کاملTechnical report on Maps of paedophile activity
As policy-making and law enforcement institutions generally operate at the national level, or at least at a regional level (Europe for instance), we studied geolocated recordings available in a large dataset obtained by a measurement of keyword-based queries submitted to a large P2P server. We observed that the fractions of paedophile queries may be orders of magnitude larger in some countries ...
متن کاملDynamics of Paedophile Keywords in eDonkey Queries
This technical report synthesizes the results of the analysis of paedophile keywords’ dynamics in two sets of eDonkey queries, collected during several months in 2007 and 2009 respectively. The goal of this work is to study the evolution of paedophile keywords’ frequency and popularity over several weeks (i.e. within a given dataset), as well as between the two different datasets. Moreover, spe...
متن کاملMeasurement and Analysis of P2P Activity Against Paedophile Content
Peer-to-peer (P2P) systems are nowadays widely used to exchange files, and it is acknowledged that they host much paedophile activity. However, current knowledge of this specific activity remains very limited, and almost no tool exist for user protection. Likewise, tools and knowledge for policy making and law enforcement are far from sufficient. The goal of the Measurement and Analysis of P2P ...
متن کاملFirst Report on Paedophile Keywords Observed in eDonkey
This report presents our first analysis results on paedophile keywords observed in exchanges between eDonkey clients and their server. We first describe our dataset and the messages studied in this context. General statistics on the number of queries, filenames, clients and keywords are provided, before focusing on paedophile keywords appearing in user queries and/or in filenames. Statistical a...
متن کامل